Computational Lexicography and Lexicology Elexbi, a Basic Tool for Bilingual Term Extraction from Spanish-Basque Parallel Corpora
نویسندگان
چکیده
We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim ofthis work is to develop some techniques for the automatic extraction ofpairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a monolingual extraction of term candidates in each language, then the creation of candidate bigrams from both segments of the same translation unit, and, finally, the selection of the most likely pair of candidates, mainly by the use of statistical information (association measures) and cognates. In the first step, we use linguistic techniques for the extraction of term candidates. The result of our work is ELexBI, a prototype tool that can extract equivalent terms from Spanish-Basque translation memories. This work wants to be a contribution to corpus-based bilingual lexicography and terminology in Basque. 1 Objective The aim of this work is to develop and apply techniques for the automatic extraction of pairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a user-friendly prototype. This work can be located in a wider research area. In fact, the extraction of equivalent terms from translation memories can be seen as a particular case of the extraction of lexical equivalences from parallel corpora. In this first stage of development, the translation memories that we use as input are the product of translators' work; that is to say, the alignment at sentence level is 100% correct. In future work, automatically aligned memories and parallel corpora aligned at document level will be used. As for the type of term equivalents we attempt to find, we deal with one-word and multiword terms which have noun-phrase structure. Furthermore, equivalences between one-word and multiword terms are also taken into account. 2 Extraction process Different approaches have been proposed for the extraction of lexical correspondences from parallel corpora. Most of them are closely related to the task of word-level alignment. According to Tiedemann (2003) and Kraif (2002a, 2002b), we look at the extraction of lexical correspondences as a different but much related task.
منابع مشابه
ELexBI, A BASIC TOOL FOR BILINGUAL TERM EXTRACTION FROM SPANISH-BASQUE PARALLEL CORPORA
We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim of this work is to develop some techniques for the automatic extraction of pairs of equivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a previous monolingual extraction of term candidates in each language, the...
متن کاملLearning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary
So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar results to those from parallel corpus...
متن کاملEvaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts
Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of too...
متن کاملUn Método de Extracción de Equivalentes de Traducción a partir de un Corpus Comparable Castellano-Gallego
So far, research on extraction of word translations from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed contexts generated from external bilingual dictionaries, allows us to achieve results similar to those from parallel corpus. In t...
متن کاملAutomatic Extraction of English Collocations and their Chinese - English Bilingual Examples : A Computational Tool for Bilingual Lexicography
This paper describes the procedures involved in developing EXEC, a web-based system which can automatically extract English collocations and their Chinese-English bilingual examples from parallel corpora. The system draws on statistics, dependency parsing, and Chinese-English parallel corpora of more than 13 million English words and 27 million Chinese characters. By taking a word as well as th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008